Hierarchical Clustering via Penalty-Based Aggregation and the Genie Approach

نویسندگان

  • Marek Gagolewski
  • Anna Cena
  • Maciej Bartoszuk
چکیده

The paper discusses a generalization of the nearest centroid hierarchical clustering algorithm. A first extension deals with the incorporation of generic distance-based penalty minimizers instead of the classical aggregation by means of centroids. Due to that the presented algorithm can be applied in spaces equipped with an arbitrary dissimilarity measure (images, DNA sequences, etc.). Secondly, a correction preventing the formation of clusters of too highly unbalanced sizes is applied: just like in the recently introduced Genie approach, which extends the single linkage scheme, the new method averts a chosen inequity measure (e.g., the Gini-, de Vergottini-, or Bonferroni-index) of cluster sizes from raising above a predefined threshold. Numerous benchmarks indicate that the introduction of such a correction increases the quality of the resulting clusterings significantly.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Clustering Based Location-allocation Problem Considering Transportation Costs and Statistical Properties (RESEARCH NOTE)

Cluster analysis is a useful technique in multivariate statistical analysis. Different types of hierarchical cluster analysis and K-means have been used for data analysis in previous studies. However, the K-means algorithm can be improved using some metaheuristics algorithms. In this study, we propose simulated annealing based algorithm for K-means in the clustering analysis which we refer it a...

متن کامل

Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm

The time needed to apply a hierarchical clustering algorithm is most often dominated by the number of computations of a pairwise dissimilarity measure. Such a constraint, for larger data sets, puts at a disadvantage the use of all the classical linkage criteria but the single linkage one. However, it is known that the single linkage clustering algorithm is very sensitive to outliers, produces h...

متن کامل

Online Aggregation of Coherent Generators Based on Electrical Parameters of Synchronous Generators

This paper proposes a novel approach for coherent generators online clustering in a large power system following a wide area disturbance. An interconnected power system may become unstable due to severe contingency when it is operated close to the stability boundaries. Hence, the bulk power system controlled islanding is the last resort to prevent catastrophic cascading outages and wide area bl...

متن کامل

به کارگیری روش‌های خوشه‌بندی در ریزآرایه DNA

Background: Microarray DNA technology has paved the way for investigators to expressed thousands of genes in a short time. Analysis of this big amount of raw data includes normalization, clustering and classification. The present study surveys the application of clustering technique in microarray DNA analysis. Materials and methods: We analyzed data of Van’t Veer et al study dealing with BRCA1...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016